Long-term community stability: accounting for detection error to understand the contributions of current and past environmental conditions to community states
Collaborator update - November 2023
Motivation
In this study, we’re interested in understanding how long-term datasets can help us understand how communities respond to environmental change. We’re also interested in accounting for detection error in this process - a practice that is well established in some sub-fields of ecology (e.g., wildlife ecology, mostly birds) but less so in others (e.g., community ecology, especially with non-bird focus). Our aim is to combine a two-part modeling process to first account for detection error in community datasets, and then use these datasets to derive standard values of community change/stability (e.g. dissimilarlity) that can then be used in a second model in which we ask how the environment (both current and past) shapes community change.
Guiding questions
Q1: How are estimates of community stability shaped by detection error?
Q2: How is community stability through time shaped by environmental factors?
Approach
In our two-part modeling framework, we are first inputting raw survey data from a community along with covariates to detection into a multi-species occupancy or abuandance model. Then, we are deriving mean values of community change (‘beta diversity’) along with uncertainty in these values (standard deviation). Finally, we are inputting these values into a regression model that incorporates current and lagged effects of environmental covariates via the stochastic antecedent modeling (SAM) framework to understand how the environment (both biotic and abiotic) shapes community change/stability.
Datasets
Our aim is to demonstrate the general utility of the modeling process as a way to unify community ecology sub-disciplines that have historical differences in how they analyze data. Thus, we have compiled a set of four datasets across taxonomic groups and environments to highlight the general utility of our approach.
Santa Barbara Channel LTER fish community
Our first dataset is from surveys of fish in the Santa Barbara Channel LTER (SBC LTER) that span from 2000-2022.
- Taxa: Fish
- Environment: Marine - Kelp forest
- Years: 23
- Number of sites: 43
- Number of species: 63
- Data Type: Abundance
- Detection covariates: fish size and dive visibility
- Environmental covariates: seasonal temperature and annual giant kelp biomass
Konza Prairie LTER bird community
The second dataset is from surveys of grassland birds in the Konza Prairie LTER (KNZ LTER) that span from 1981-2009.
- Taxa: Birds
- Environment: Terrestrial - Tallgrass prairie
- Years: 28
- Number of sites: 3-11 (still WIP)
- Number of species: TBD
- Data Type: Abundance
- Detection covariates: bird size and survey length
- Environmental covariates: seasonal temperature, precipitation, (and potentially annual plant biomass)
Sevilleta LTER grasshopper community
The third dataset is from surveys of grasshoppers in the Sevilleta LTER (SEV LTER) that span from 1992-2019.
- Taxa: Insects
- Environment: Terrestrial - Blue gramma grassland and Creosote shrubland
- Years: 27
- Number of sites: 60
- Number of species: 46
- Data Type: Abundance
- Detection covariates: none (none provided in metadata, others not easy to derive (e.g. body size) from literature review)
- Environmental covariates: seasonal temperature and plant biomass
Petrified Forest National Park plant community
The final dataset is a set of surveys of understory plant communities in Petrified Forest National Park (PFNP) that span from 2007-2022.
- Taxa: Plants
- Environment: Terrestrial - Grassland and shrubland
- Years: 15
- Number of sites: 10 (subset because of computation time)
- Number of species: TBD
- Data Type: Presence-Absence (Detection/Nondetection)
- Detection covariates: cover class
- Environmental covariates: seasonal precipitation and VPD
Progress so far
SBC LTER fish dataset
- Detection model: complete
- SAM regression model: complete
KNZ LTER bird dataset
- Detection model: WIP
- SAM regression model: not run
SEV LTER grasshopper dataset
- Detection model: complete
- SAM regression model: complete
PFNP plant dataset
- Detection model: WIP
- SAM regression model: not run
Potential figures and follow-up analyses
Following are a set of potential figures we could consider including the paper (or supplements) along with follow-up analyses associated with these.
Q1: Detection error
Covariates driving detection
When thinking about detection, we could look at how covariates we input into the model influence our ability to detect species:
Note: There were no covariates for detection easily available for the grasshopper dataset, so there is no panel of this potential figure for that dataset.
Accounting for detection error change and estimates of change
We could also look at how observed versus modeled estimates of dissimilarity compare:
In this case, we could perform a post-hoc analysis of whether estimates of dissimilarity change with data type (observed versus modeled) and whether this varies by dataset. This could be a quick frequentist model (glmm with random effects of site-year as a “repeated measures” and covariates of data type, dataset ID, and their interaction) or could be a follow-up Bayesian model.
We could also explore different ways of thinking about what “observed” data are. Right now, I’m taking the maximum number of individuals observed per species in each site-year combo as the “observed” value, but this assumes that observers are going out more than once each year to each site to survey. This is often not the case, and often we only go out and survey once and call it good. So this figure/analysis could include another way of thinking about “observed” data - by taking the observation within each dataset for each site-year combo with the maximum number of individuals/species observed (assuming observers are going out at the time that maximizes detection in their system) rather than summarizing as I have done. Likely, this method will result in what I would expect would be an even larger spread within the observed data than what you see here.
Temporal trends in change with and without detection error
We might also illustrate how variable the two types of data (observed versus modeled) are over time, using a few sites as illustration:
Qualitative assessment of detection probabilities
Depending on whether datasets greatly vary in their observed versus modeled estimates, we could do an qualitative assessment of whether this is related to relative “rarity” in the dataset, based on the distribution of detection probabilities for each species in the dataset:
Q2: Environmental drivers
For regression models, we could illustrate the effects of all covariates in the model:
All covariate effects
Significant covariate effects and importance weights
As well as demonstrate how important variables shape community change as well as whether these effects are relatively instantaneous or lagged:
Figure feedback
We are limited to 6 figures and tables in the paper, and many of these figures could be multi-panel to highlight all four datasets. Do you all have thoughts on which figures (or multi-panel figures) you think belong in the main text versus in the supplement? I have some thoughts, but would love your feedback as well.
Next steps
Shelby and I are currently working on getting the final datasets through this workflow.
An and I are drafting the manuscript to circulate.
Timeline
We have a tight timeline coming up in the next couple of months, so thank you all in advance for any and all contributions! Here is the timeline we are aiming for for important checkpoints along the way:
- Mid-November: Draft of manuscript sent to all collaborators
- November 30: Draft back to myself and An to incorporate edits
- December 15: Draft back out to co-authors for second review
- December 30: Draft back to myself and An to incorporate edits
- January 15: Submission deadline